STOP Early Journal Content on JSTOR, Free to Anyone in the World This article is one of nearly 500,000 scholarly works digitized and made freely available to everyone in the world by JSTOR. Known as the Early Journal Content, this set of works include research articles, news, letters, and other writings published in more than 200 of the oldest leading academic journals. The works date from the mid-seventeenth to the early twentieth centuries. We encourage people to read and share the Early Journal Content openly and to tell others that this resource exists. People may post this content online or redistribute in any way for non-commercial purposes. Read more about Early Journal Content at http://about.jstor.org/participate-jstor/individuals/early- journal-content . JSTOR is a digital library of academic journals, books, and primary source objects. JSTOR helps people discover, use, and build upon a wide range of content through a powerful research and teaching platform, and preserves this content for future generations. JSTOR is part of ITHAKA, a not-for-profit organization that also includes Ithaka S+R and Portico. For more information about JSTOR, please contact support@jstor.org. 188 American Statistical Association [48 THE STATISTICAL METHOD IN PROBLEMS OF WATER SUPPLY QUALITY By Abel Wolman, Maryland State Department of Health INTRODUCTORY The concept of water supply quality has the simplicity of the unknown to the layman, but the complexity of the universe to the sanitarian. If one uses the mathematician's measure of the complexity of a function — the number of its attributes — the problem of water supply quality, a function dependent upon mutually active natural, physical, chemical and biological phenomena, offers an attractive field of study to the statistician. For the professional statistician has been concerned always with "quantitative data affected by a multiplic- ity of causes" 1 and with their elucidation. In considering the causes operating to produce relatively good or bad waters, such as rainfalls, pollution, purification, etc., and their interpretation upon the basis of laboratory findings and personal surveys it becomes manifest that problems of water supply quality fall well within the scope of statistical method. Just as in all statistical problems, so in that of water supply quality, the investigator is confronted with the two-fold task of deter- mining the method of evaluating the units of interpretation and of defining the limiting values of such units. The method of approach to each problem involves a statistical viewpoint, as well as a quantita- tive methodology. The present paper has been prepared in order to illustrate, in as brief terms as possible, this statistical method of ap- proach, by developing therein a few examples of its application to the question of water supply quality. The writer plans to trace the evolution of the concept of water supply quality in the sanitarian's mind and to point out in such a development the function which the statistical art has performed or may be expected to supply in the future. The discussion appears to be a necessary one since hitherto the water supply investigator has been accused of an aversion for the quantitative sciences, while, on the other hand, the professional statis- tician has shown a neglect of a field which perhaps did not appear to be worthy of his mettle. The present study may serve to remove this friendly distrust which retards in a degree progress in critical studies of water supply quality. I. THE LABORATORY EXAMINATION OF WATER SUPPLIES The sanitary quality of water supply must be predicated necessarily upon the demonstration of its relative inability to produce disease. 49] Statistical Method in Problems of Water Supply Quality 189 With the present germ theory of disease, such a demonstration resolves itself into the laboratory problem of enumerating the number and types of pathogenic organisms in stated quantities of water. It is apparent, therefore, that the technique in this instance is largely bacteriological, and the discussion, for purposes of simplicity, may be restricted to the problems of the evaluation of bacterial units, as illus- trative of the statistical method. It is manifestly impossible and impracticable to examine an un- known water in such manner as to determine its content of all kinds of bacteria or even of those relatively few classes of specific organisms which it is known are both disease-producing and capable of living in water. It is more desirable as well as convenient, therefore, to choose one family or group of micro-organisms whose natural habitat and life history are similar to the variety of pathogenic organisms and whose detection by laboratory methods is most simple and speedy. Bacteriologists have concluded that a particular class of bacteria serves as the most convenient index to water supply quality or con- tamination. They have chosen as this index class or type the so-called colon or bacillus coli group. The B. coli group has been so selected, because its origin is in general the colon or digestive tract of man and its presence is usually indicative of human sewage pollution (the possible and probable existence of colon types in other environments need not concern us at this point). One of the primary objects of the bacteriologist, therefore, is to differentiate the bacterial species present in a water supply, so as to demonstrate the presence or absence of members of the B. coli group. In addition, it is necessary to obtain some idea of the relative frequency of such a group, since smaller numbers naturally connote a more remote pollution, due to the dying off of bacteria in the unfavorable environment of water, to the presence of antagonistic life, and to other natural and artificial barriers to its development. The problems arising in the laboratory differentiation of bacterial varieties offers, therefore, material for an initial example. Two general methods of distinguishing groups of bacteria are available. Both are based upon the method of differences. In the one case, morphological or structural characteristics, and in the other, metabolic distinctions control. Various classifications of the colon group, for instance, are based upon its ability to produce acid and gas from fermentable substances. Investigators have observed that certain types of B. coli ferment such complex organic compounds as sucrose, dulcitol and raffinose while others do not. Differences in the amount and character of gas formation from certain substances distinguish other types of bacteria. 190 American Statistical Association [50 In all classifications, however, it has been recognized that the same group may have a variety of reactions which overlap partially those of other groups. Two types of bacteria, for instance, may both fer- ment sucrose, but may differ in their effect upon a second or third compound. This gives rise naturally to a vast amount of possible combinations between characters and Levine 2 points out that "as the number of fermentable substances increases, the number of varie- ties increases geometrically approaching infinity. The number of 'varieties' is given by the formula 2" where V is the number of char- acters studied. Thus with 8 characters there are 256 possible com- binations; this number rises to 1,024 with 10 characters and to 65,536 when 16 characters are observed. The absurdity of regarding each character as of similar and equal differential value is thus evident." Levine, as well as other more recent investigators, has concluded that the principle of the correlation of characters should be emphasized in the attempt to distinguish bacterial species. He points out that certain properties have been universally accepted, after long checking, as reliable evidences of bacterial differences. Among such properties, he enumerates the selective dyeing of bacteria, their powers of spore formation, and their adaptation to aerobic or anaerobic development. The taxonomic value of the characters of motility, indol formation, and fermentation of certain compounds, on the other hand, he assumes to be still debatable. In order to avoid the adoption of a confusing classification of bacteria upon the basis of every character studied (of which we have indicated only a few) he has recourse to a basis of sub- division "on that character which gives the greatest amount of infor- mation as to the manner in which the resulting sub-groups react with respect to other characters. " 2 By making use of the above principle Levine evolves a classification of coli-like bacteria which is based almost completely upon statistically evaluated correlated characters. For the purpose of this study, he recognizes two main strains of bacteria, the B. coli and the B. aerogenes-cloacae group, which earlier investigations have shown to be distinguishable most often by their reactions to methyl-red and to the Voges-Proskauer reagent. The first strain is usually methyl-red positive and Voges-Proskauer negative, while the second strain shows the reverse. The justification of this initial sub- division into two main groups consists in the fact that the strains thus subdivided show end products of carbohydrate fermentations of two entirely distinct kinds. Levine's procedure consists in tabulating all of the reactions of the organisms studied in each of the above two groups in two different tables, from which are calculated the coefficients of correlation for each 51] Statistical Method in Problems of Water Supply Quality 191 pair of characters. He selects, then, for subdivision that character which gives the highest coefficient of correlation with the greatest number of other characters. For these resulting sub-groups new corre- lation tables are prepared and further subdivision is made. These sub-groups are regarded as species and each is assigned its name. In order to illustrate Levine's use of the coefficient of correlation for taxonomic purposes, let us follow his procedure in the subdivision of the B. coli, or methy-red positive and Voges-Proskauer negative, group of bacteria. For the 182 strains of this group that were studied by means of microscopic and metabolic methods, the coefficients of correlation shown in Table I were obtained. TABLE I. COEFFICIENTS OF CORRELATION OBTAINED FROM PAIRS OF CHARACTERS AMONG 182 STRAINS OF THE B. COLI GROUP Motility Indol Sucrose Raffinose Dulcitol Glycerol — .39 + .53 + .43 + .53 + .18 — .39 + .08 + .00 + .02 — .28 + .53 + .08 + .99 + .58 — .38 + .43 + .00 + .99 + .58 — .29 + .53 + .02 + .5S + .58 — .21 + .18 -.28 — .38 — .29 — .21 + .40 + .76 + .20 + .27 + .60 + .52 Salicin Motility Indol . . . Sucrose . . Raffinose Dulcitol . Glycerol Salicin . . + .40 + .76 + .20 + .27 + .60 + .52 Since Levine's criterion for the choice of a character for subdivision is that that character should give the highest coefficient of correlation with most other characters, it is apparent, from an inspection of Table I, that sucrose, raffinose, dulcitol, and salicin meet this criterion more completely than do other properties. For special technical reasons, Levine chooses sucrose for primary division of the B. coli group and obtains by differentiation on sucrose ninety-three strains of the sucrose positive and eighty-nine strains of the sucrose negative groups. These two groups combined form, of course, the total of 182 strains initially chosen for study. Further study of the sucrose positive strains dis- closes a series of coefficients of correlation of characters as shown in Table II. TABLE II. COEFFICIENTS OF CORRELATION FOR PAIRS OF CHARACTERS AMONG 93 SU- CROSE POSITIVE STRAINS OF THE B. COLI GROUP Motility Indol Dulcitol Glycerol Salicin Motility Indol Dulcitol Glycerol — .27 + .67 + .40 + .54 — .27 + .05 — .42 + .28 + .67 + .05 — .32 + .39 + .40 — .42 — .32 + .32 + .54 + .28 + .39 + .32 192 American Statistical Association [52 Table II indicates that motility is the best correlated character and this property provides, therefore, for two further sub-groups, a sucrose- positive motile sub-group and a sucrose-positive non-motile group. These sub-groups are treated in the manner already illustrated ahd the coefficient of correlation for different characters provide for further subdivision. With the aid of this statistical interpretation of his stud- ies of 333 coli-like bacteria, isolated from various sources, Levine sug- gests a classification of bacterial varieties. The summary of this classification need not be repeated here, since the reader is interested more in his method of attack than in the resulting bacteriological findings. Such classifications as Levine's supply the sanitarian with the quali- tative information necessary for the interpretation of one phase of the water supply quality problem.* The analyst dealing with waters is concerned not only with the nature of the bacterial types present therein, but also in the magnitude of their content, since it is the latter which indicates the degree and the remoteness of pollution. In the search for a potable water, it is often useless to seek that water which has no possible source of contamination, but it is always necessary to determine the quantitative bacterial importance of the latter. The methods so far described answer only one question, that is, what types of bacteria are present in the water. In the solution of the second inquiry, regarding the number of a particular type in a stated quantity of water, statistical method has played recently an important part. In the simpler tests for the B; coli group in waters, the so-called fermentation-tubes are used. These tubes contain the medium selected for most efficient differentiation of the B. coli group from other kinds of bacteria and are inoculated with specific quantities of the water to be tested. The production of gas in the tubes after stated periods of incubation indicates the presence of the B. coli group. Our knowl- edge that of five tubes, each inoculated with 0.1 c.c. of the water, four show the presence of the organism, is of value, but more important is the additional fact that such a series of findings indicates that the prob- able number of organisms in the sample tested is about 1,600 per 100c. c. This conversion of qualitative fermentation-tube results into quanti- tative values is of special interest to the statistician. In 1915, McCrady 3 showed that "the frequency of the appearance of the fermenting organism in the volume drawn from the sample for the test is an exponential function of the number of such organisms in the sample," and that "every fermentation-tube result, whether simple or compound, corresponds to one most probable number of organisms." *The subdivisions Levine develops have their importance to the investigator in the fact that species or varieties appear to be somewhat correlated with habitat or source of pollution. 53] Statistical Method in Problems of Water Supply Quality 193 By employing the theory of probabilities, he demonstrates that, given p the result " in 1 volume," for instance, the corresponding most P + Q probable number is given by the solution for x of the equation \ V I p+q Thus, for the result "five out of ten tubes positive in 1 c.c," the most probable number is given by solution of the equation 1 — .99* = 5/10, since V — 100 c.c, assumed as the original quantity of water sampled. The equation being solved, a; = 69 or the most probable number of B. coli in the sample, per 100 c.c. For compound results, such as — - — - in 10 c.c, in 1 c.c, a p+q r+s more complicated formula is employed which is built up, as follows: 3 For the result in 10 c.c. the equation becomes (p+q) (log . 9) = — which is obtained by differentiating for a maximum the 1 - . 9* equation given in the earlier paragraph for the probability of the results. 7) r If the result is in 10 c.c, in 1 c.c, the equation stands p+q r+s (p+q) (log .9) + (r+s)(log ■")= i_ o* 1-99* where (p+q) = number of tubes inoculated with 10 c.c. of sample (r+s) = number of tubes inoculated with 1 c.c of sample x = number of fermenting organisms in 100 c.c. of sample p and r = number of tubes giving positive results in 10 and 1 c.c. respectively. If lower additional quantities of water are tested, extra similar terms are added to each side of the above equation. This equation has been modified by Wolman and Weaver 4 into lOOp lOr 100(p+g) + 10(r+s) = 1-.9* l-.99 x since, approximately, log .9 = 10 log .99 = 100 log .999. McCrady published later 5 a series of tables for the rapid interpre- tation of these results which makes the standardized use of the prob- able numbers of B. coli possible for the water supply investigator.* *The assumption of McCrady that the distribution of B. coli is similar to that in a mixture of a few red balls with many white balls is to be contrasted with the hypothesis of other workers that bacteria are uniformly distributed in water (G. C. Whipple 13 ). More recent independent investigators, however, confirm McCrady's assumptions. 194 American Statistical Association [54 The work of McCrady was followed by other investigations dealing with the numerical interpretations of B. coli tests, of which the more important are Stein 6 , 7 , 8 , Greenwood and Yule 9 , and Wells 10 , ", I2 The results of Stein and Greenwood and Yule, although differing in technique and in additional interesting viewpoints, are in substantial agreement with those obtained by McCrady. Stein 8 adds consider- able interesting statistical material to the B. coli problem by introduc- ing the so-called B. coli factor method, in which he considers the most probable number of B. coli per c.c. from the percentage of positive tests, the expected error of results, the study of the distribution of coli during a series of tests, and the "coli characteristic" which attempts to show by one figure, the average coli, the expected error and the variable distribution. The discussion of the problem by Greenwood and Yule 9 has all the intricacy and mathematical complexity usually associated with Yule's contributions. Their findings, however, agree with those of McCrady and Stein. Greenwood and Yule, for instance, give as their formula for the number of B. coli per c.c, when using several tubes with 1 c.c. each tj v o o i V+Q x = ii. con per c.c. =2.3 log q whereas McCrady gives for the same condition (using an original size sample of 1,000 c.c.) log — — log — — log — - — x = P+q = 2+1 = ?+g = -2.3 1o g q 1 ,000 log .999 - 1 ,000 ( . 0004344) - . 4344 P + q = 2.3 log 2+1 q Perhaps the mathematician's interest may be aroused to the sani- tarian's problems of water supply by the mere examination of Green- wood and Yule's discussions, while the bacteriologist may view with some alarm the same paper. It should be postulated in either case, however, that superficial considerations should not prevent the mutual aid which these two branches of science may extend to each other. While such complexity of treatment of the numerical interpretation of fermentation-tube tests as is indicated by the formula \ mi / \m2 ,_/:[< hmtul i p—hm \ _ p—hamil i — hai /hi T / \m\ I \mi "Jl-e- ha nY n ] ( e-K-nM-e-KJ Adh 55] Statistical Method in Problems of Water Stipply Quality 195 may attract the statistician, it is hoped that it may not at the same time deter the laboratory technician from the adoption of devices which provide for more adequate solutions of his problems. Emphasis must be placed upon the fact that the mental attitude resulting from the adoption of statistical method has much promise in a field of endeavor where laboratory findings are too infrequently tested for accuracy of interpretation and rarely treated as examples of mass phenomena. The work of such men as Stein and McCrady has done much to introduce such methods by clarifying our concepts of fermen- tation tube results and their relative significance. That the statistical method is an important asset in the exposition of laboratory findings is illustrated in another series of studies of various phases of water supply. Whipple 13 , for instance, has demonstrated that "if, in a series of daily observations of the number of bacteria in a filter effluent extending over a year the deviation of any determination from the mean should be found to be more than five times as much as the probable error, to use a round number, this should be rejected from the series as being, for some reason or other, abnormal." He has made important contributions to the study of the frequency distribu- tions of measures of various bacteriological, biological, and chemical characteristics of water, such as the preliminary finding that extended series of filter effluent results follow definite statistical laws in their distribution. His conclusion has been further substantiated by the more recent study of Wolman 14 of thousands of laboratory findings, in which it is indicated that the logarithms of bacterial counts, through long periods of time, have the characteristic normal probability dis- tribution of more familiar biological statistical data. It is of considerable interest to refer at this point to a form of graph presentation of data developed by a sanitary engineer which may be unfamiliar to most statisticians. Allen Hazen 16 in 1914 devised a form of chart ruled with a horizontal scale so divided tha the curve of probability would plot thereon as a straight line. Any series of ob- servations, therefore, which varied in accordance with the probability law would plot also as a straight line. Illustrations of the use of such paper in water supply problems may be found in the original paper of Hazen 16 and in subsequent discussions by Whipple 13 and Wolman 14 . Stein 16 , in his study of the bacterial count in water and sewage, has added considerable material to our conceptions of the variability of laboratory findings and their importance in practical studies. He has concluded, after an interesting detailed analysis of the problem, that : (a) For platings of a single sample of water, the mean error is equal to the square root of the number of colonies on a single plate, or the square root of the average number of colonies on several plates. 196 American Statistical Association [56 (b) The variations to be expected for careful and accurate work with bacterial counts are indicated by : (1) Standard Deviation of ± 12% (2) Deviation (1 in 10 times) of ± 25% For ordinary routine work: (1) Standard Deviation of ± 25% (2) Deviation (1 in 10 times) of ± 50% His comparison of the characteristics of bacteriological data with certain mathematical series should be of interest to the reader, since he shows, for example, that for daily tests of Lake Erie water for one month the Lexian Ratio is 29.00 and the Disturbancy Coefficient 124.00, while the corresponding values for a normal mathematical series (Ber- nouilli) are given as 1.00 and 0.00 respectively. II. THE INTERPRETATION OF THE QUALITY OF WATER SUPPLIES In preceding paragraphs the writer has indicated a few of the problems encountered in the laboratory technique of water supply examination, which lend themselves to statistical treatment. It has been impossible to include in the present brief paper any complete survey of such appli- cations to other phases of laboratory procedure, but sufficient material has been presented, to demonstrate that the data in the field of labora- tory technique have considerable to offer to the professional statistician as bases for the development of interpretative principles of quality. The writer believes that some mention should be made briefly of certain interesting possibilities of development in the application of statistical method to general problems of laboratory procedure. The use of the coefficient of partial correlation, for instance, does not appear to have been introduced widely in the interpretation of laboratory find- ings, yet the necessity for its application is most apparent. Often investigative work in water supplies is carried out on a large or plant scale with the aid of analytical laboratory methods. In the study of the chlorination of a water supply, for example, a number of different variable quantities such as turbidity, color, organic content, and bac- terial densities have their effect in modifying the efficiency of the dis- infection process. In practically all conclusions from such studies no attempt is made to determine mathematically the effect of such varia- bles, other than by mere inspection of tabulated data. There is little doubt that erroneous conclusions are often obtained through the failure to evaluate quantitatively the importance of fluctuations in the various characteristics of waters subject to chlorination. It is almost impossi- ble to determine by qualitative inspection of a series of daily observa- tions, over an entire year, of temperature, turbidity, color, organic 57] Statistical Method in Problems of Water Supply Quality 197 content, and bacterial density in a water supply, whether the effect of a constant dosage of chlorine is influenced more greatly by any one of the above characteristics or by a combination of several or all of them. The same problem arises, of course, in the study of any of the phe- nomena associated with the purification of water supplies. In the co- agulation of suspended matter in water, for instance, all the variables such as time, agitation, temperature, hydrogen-ion concentration, nature of suspended matter, and character of coagulant play an inter- connected part. The principle of partial correlation could be adapted with profit to these problems of associated phenomena. The application of such a statistical principle as pointed out above is complicated, however, by the fact that the more simple statistical coefficients usually cannot be directly applied to the problems encoun- tered, on account of the fact that such measures presuppose the use of data having a symmetrical or Gaussian distribution, while the phe- nomena with which the sanitarian has to deal often are characterized by asymmetrical distributions. 17 , 18 Michael 18 has discussed in this connection the determination of the most probable number of bacteria present in a sample and has demon- strated that it is not permissible to apply the probable error in the usual manner on account of the fact that the logarithms of the plate counts, and not the counts themselves, show a Gaussian frequency distribu- tion 19 . McEwen and Michael 17 in another field of investigation have been confronted with the same problem of determining the "functional relation of one variable to each of a number of correlated variables" where such variables do not show the usual symmetrical frequency distribution. It is manifestly impossible to extend in this paper the elucidation of these applications of statistical method to problems of laboratory and plant, but the reader may find profitable data in the original papers already noted. The opportunity for the application of statistical tests to problems of water supply quality is not restricted, however, to the materials of the analyst. The consideration of the potability of a supply involves always a series of mutually active attributes, each of which has its im- portance in determining the character of the water. The concept of quality connotes, therefore, a composite of properly weighted individ- ual and fundamental units, in the evaluation of which statistics again comes to the fore. It is unfortunate, however, that in the field of interpretation of quality statistical method has been even slower of application than in the corresponding study of laboratory data. The quantitative eval- uation of sanitary data has always given way to the liberal exercise of 198 American Statistical Association [58 expert personal judgment. Where a multiplicity of causes predeter- mines a phenomenon, such as quality, it was thought that a proper perspective was possible only through the development of a maturity of judgment in which the play of the manifold effects was qualitatively summarized rather than quantitatively analyzed. As the methods of diagnosis of quality developed, however, the opportunity for the fruit- ful application of the principles of mass phenomena gradually becomes apparent. With this development of a new viewpoint, good as well as evil sometimes resulted. A complete swinging of the pendulum to the quantitative side of interpretation was feared, where the attempt was made to substitute for individual experience and judgment pseudo- quantitative measures of doubtful significance. Some of these efforts, in which statistical laws frequently were ignored, will be discussed later in this paper. In general, however, a realization is gradually coming over the sanitarian that statistics as a means, rather than as an end, has much to offer in the clarification of his problems. If the succeeding pages seem somewhat bare, in their statistical implication, the pro- fessional statistician should remember that the concepts there dis- cussed mark the advance of a new light in sanitary engineering, which, though feeble in its flicker, gives promise of a greater brilliance in the not distant future. Attempts to formulate water supply standards of composite char- acter represented one of the earliest applications of semi-statistical method. Most of these were based upon the erroneous conclusion that methods of evaluating units had been standardized throughout the country. Attention has been called to this fallacy of endeavoring to establish limiting values of units attained by varying methods by Hinman 20 , Norton 21 , and Morse and Wolman 22 . Fundamental train- ing in statisticaHnterpretation no doubt would prevent the adoption of water supply quality standards before the principles of unit evalua- tion have been rigidly enforced. It is not amiss, perhaps, to call attention at this point to the close analogy between the so-called scoring of a water supply, or the quanti- tative allocation of the quality upon the scale of sanitary safety, and the statistician's concept of index numbers. Wolman 14 has shown recently that the operations involved in making a price index number are similar to those followed, to a greater or less extent, by investiga- tors of water supply scores. In the case of price index numbers, the object of weighing is to give each commodity included in the index number an influence upon the results corresponding to its commercial importance. In water supply index numbers, the object of weighing likewise is to give each factor making up the score an influence upon 59] Statistical Method in Problems of Water Supply Quality 199 the results corresponding to its sanitary importance. Although the problems in the two fields are the same, their solutions are necessarily- different, since, in the case of water supply scores, the conversion to a common base of such units as bacterial results, sanitary surveys, opera- ting efficiencies, etc., cannot be carried out because of the presence of varying personal opinion or judgment. It has been noted 14 , however, that it still remains possible to make use nationally of simplified index numbers of water supply quality restricted in their range of signifi- cance and composed of similar units or, better still, of individual units, provided the method of evaluation of such units has been definitely and completely fixed. Interpretations of the quality of a water include frequently more than a summary of the structural and environmental features of the supply. The possibilities of the intelligent and fruitful application of statistical devices, such as the coefficients of correlation and of varia- tion, to other phases of water supply are mentioned only briefly here, since their complete discussion would involve a paper of a far too great length. Whipple* for instance, has suggested the use of the coefficient of correlation in analyzing the vital statistics of cities which have made changes from poor to good quality water supplies, in order to demon- strate quantitatively the existence of the Mills-Reincke phenomenon. Hazen 15 has made excellent use of statistical method in his analysis of the storage provided in an impounding reservoir on any stream and the quantity of water which can be supplied continuously by it. He introduces the coefficient of variation as a measure of the degree of variation in flows of different streams and by its further use has found it possible to get an approximate expression for the storage required to carry the surplus water of wet years over to dry years, which expres- sion, in general terms, applied equally well to streams in different localities. In addition, he describes methods of estimating the proba- ble errors in the results obtained and makes the important comment that "frank recognition of the large probable errors in many of the results cannot fail to be advantageous." 15 The opportunities for further application of similar methods have appeared in the present writer's studies of the correlation of bacterial contents in water supplies with rainfalls upon stream watersheds and with hygienic resultants of inferior quality such as typhoid fever and diarrhoeal diseases. In these particular studies, the statistician could contribute excellent aid, since the writer is not aware of an effective method of comparing correlated phenomena in which one series of characteristics is continuous, while another is discontinuous. In addi- * Personal communication. 200 American Statistical Association [60 tion, quantitative variations in magnitude of the values in both series are not of paramount importance, but the direction of such variations is the interesting event. The coefficient of concurrent deviations in this instance, does not appear to supply all the desiderata. An exam- ple may make our problem clearer. In the study of the daily tap water analyses of a city water supply, we find, by inspection, that the B. coli contents rise after rains on the watershed of the stream supply- ing the town. It is also found that such rises are masked, to varying degrees, by purification processes and by the efficiency of operation of such processes. If changes in method and efficiency of purification are brought about and the qualitative reflection of rainfalls in resultant B. coli density in tap waters is modified, how can we measure quantita- tively the change in sensitiveness of tap water quality to rainfall from month to month? The data at hand for this purpose, reduced to simplest terms, are in each month B. coli values for each day (continu- ous series), which differ in density from day to day, and rainfall records (discontinuous series) which may give a zero value for all the days but three or four during the month. If, during the month of July, the B. coli per 100 c.c. rose from 2 to 2,000 from July 7 to July 8, following a rain of 0.8 inch on the stream on July 7, and during August the B. coli per 100 c.c. showed no jumps above 5 in spite of a number of days of rainfall of about 0.8 inch, what should be the statistical relation be- tween the months of July and August for these particular considerations? This paper should not be concluded without some reference to the part that the study of purification processes has played in modifying and determining the quality of water supplies and the importance therein of the mathematician's tools. It is frequently the sanitarian's problem to include in his valuation of a water's safety some definite estimate, among other things, of the efficiency of operating features involved in the treatment of such a supply. This problem has given rise to various measures of treatment efficiencies, which only recently have been subjected to rigid statistical study. As an illustration of this type of measure the percentage removal of bacteria from untreated to treated waters has persisted. Statistical objections to this measure are well known to the reader and substitutes for this measure of per- formance, and indirectly of quality, have been much sought after. It was long recognized that the real measure of performance should in- clude data regarding the distribution of the efficiencies over long periods and recommendations suggesting the classification of bacterial results according to frequency distributions have done much to clarify the interpretation of treatment figures. Further development of the same problem of plant performance 61] Statistical Method in Problems of Water Supply Quality 201 along statistical lines has been made by Wolman 23 , in the study of the nature of bacterial removal in filtration plants. In this discussion, it was suggested that "the normal performance of a water filtration plant may be represented by a curve having the equation: y = x c , where y and x are respectively the raw water and final effluents counts, and c is a constant for the particular plant under discussion." In other words, the tentative hypothesis was brought forth that the final efflu- ent count, on the average, is an exponential function of the raw water count. The evaluation of "c" replaces also the unsatisfactory per- centage efficiency as a more adequate measure, by using the ratio of the logarithms of the counts instead of the ratio of the actual bacterial values. It is apparent that a measure of performance to be effective for adaptation to quality interpretation should include more than an array of its daily values, since it is the consistency of bacterial removal which predetermines the position of a form of treatment in the scale of the safety of a supply. Heretofore, no single unit of measure of this degree of consistency of removal has been available, although the fitting of normal performance data to the logarithmic curve of filtration supplied at least a graphic method of testing consistency. 2 ? If bacterial data are arranged and plotted on the probability paper already referred to in the discussion, it becomes extremely easy to obtain the values of the semi-interquartile ranges of the figures in successive steps of purifica- tion. The ratio of such values of the ranges for any two steps appears to the writer to present some promise of a real measure of the "level- ling" effect of purification processes, since it measures the change pro- duced in the frequency distribution of bacteria in passing through the treatment. The demonstration of its value may be more apparent to the reader by reference to material given elsewhere. 24 REFERENCES 1 Yule, G. U., An Introduction to the Theory of Statistics. 2 Levine, Max, A Statistical Classification of the Colon-Cloacae Group, Journal of Bacteriology, Vol. 3, No. 3, May, 1918. 3 McCrady, M. H., The Numerical Interpretation of Fermentation-Tube Results, Journal of Infectious Diseases, Vol. 17, No. 1, July, 1915. 4 Wolman, Abel and Weaver, H. L., A Modification of the McCrady Method of the Numerical Interpretation of Fermentation-Tube Results, Journal of Infectious Diseases, Vol. 21, No. 3, September, 1917. 5 McCrady, M. H., Tables for Rapid Interpretation of Fermentation-Tube Results, The Public Health Journal (Canada), Vol. 9, No. 5, May, 1918. 6 Stein, Milton F., Making the B. Coli Test Tell More, Engineering News-Record, Vol. 78, No. 8, May 24, 1917. 202 American Statistical Association [62 7 Stein, Milton F., On Numerical Interpretation of Bacteriological Tests, Engineering News-Record, Vol. 82, No. 23, June 5, 1919. 8 Stein, Milton F., The Interpretation of B. Coli Test Results on a Numerical and Comparative Basis, Journal of Bacteriology, Vol. 4, No. 3, May, 1919. 9 Greenwood, J. Junr. and Yule, G. U.dny, On The Statistical Interpretation of Some Bacteriological Methods Employed in Water Analysis, Journal of Hygiene, Vol. 16, No. 1, July, 1917. 10 Wells, Wm. F., The Geometrical Mean as a B. Coli Index, Science, N. S., Vol. 47, No. 1202, January 11, 1918. 11 Wells, Wm. F., The Bacteriological Dilution Scale and the Dilution as a Bacterio- logical Unit, American Journal of Public Health, Vol. 9, No. 9, September, 1919. n Wells, Wm. F., On a Standard System of Bacteriological Dilutions, American Journal of Public Health, Vol. 9, No. 12, December, 1919. 13 Whipple, G. C, The Element of Chance in Sanitation, Journal of the Franklin Institute, Vol. 182, No. 1, No. 2, July and August, 1916. 14 Wolman, Abel, Index Numbers and Scoring of Water Supplies, Journal of the Amer- ican Water Works Association, Vol. 6, No. 3, September, 1919. 15 Hazen, Allen, Storage to be Provided in Impounding Reservoirs for Municipal Water Supply, Trans. American Society of Civil Engineers, Vol. 77, p. 1539. 16 Stein, Milton F., A Critical Study of the Bacterial Count in Water and Sewage, American Journal of Public Health, Vol. 8, No. 11, November, 1918. 17 McEwen, George F. and Michael, Ellis L., The Functional Relation of One Variable to Each of a Number of Correlated Variables Determined by a Method of Successive Approximation to Group Averages: A Contribution to Statistical Methods, Proc. American Academy Arts and Sciences, Vol. 55, No. 2, Decem- ber, 1919. 18 Michael, Ellis L., Concerning Application of the Probable Error in Cases of Extremely Asymmetrical Frequency Curves, Science, N. S., Vol. 51, No. 1308, January 23, 1920. 19 Johnstone, James, The Probable Error of a Bacteriological Analysis, Bept. Lane. Sea-Fish. Lab., 1919, No. 27. (Not read.) 20 Hinman, J. J. Jr., American Water Works Laboratories, Journal of the American Water Works Association, Vol. 5, No. 2, June, 1918. 21 Norton, J. F., Comparison of Methods for the Examination of Water at Filtration Plants, Journal of Infectious Diseases, Vol. 23, 1918, Pp. 344-50. 22 Morse, Robert B. and Wolman, Abel, The Practicability of Adopting Standards of Quality for Water Supplies, Journal of the American Water Works Association, Vol. 5, No. 3, September, 1918. 23 Wolman, Abel, A Preliminary Analysis of the Degree and Nature of Bacterial Removal in Filtration Plants, Journal of The American Water Works Associa- tion, Vol. 5, No. 3, September, 1918. 24 Wolman, Abel and Powell, S. T., Sanitary Effect of Water Storage in Open Reser- voirs, Engineering News-Record, Vol. 83, No. 18, October 30-November 6, 1919.